Search Results: "Axel Beckert"

6 October 2013

Axel Beckert: Searching in Screen's copy mode

I m using GNU Screen daily for definitely more than a decade and I became maintainer of Debian s screen package nearly exactly two years ago. Nevertheless it still happens occassionally that I discover features yet unknown to me. Recently I had one of these moments again: I looked for a specific line in the long output of a command which has run inside a Screen session. For that I entered Screen s copy mode with Ctrl-A [ and scrolled around with arrow keys and page-up and -down keys. But didn t find it. I thought, it would be cool if I can search for the string I m looking for. Intu tively I typed /, the search string and pressed enter. And it worked! It jumped to the next occurrence of that string. Of course I immediately had to check if tmux has such a feature, too. And it indeed has, but it seems to be a less sophisticated implementation:
Feature Key-binding in GNU Screen Key-binding in Tmux
Switch into copy/scroll mode
(needed for the remainder)
Ctrl-A [ Ctrl-B [
Search for string once, forward / + string + Enter Ctrl-S + string + Enter
Search for string once, backward ? + string + Enter Ctrl-R + string + Enter
Search for string again, forward / Enter Ctrl-S Enter
Search for string again, backward ? Enter Ctrl-R Enter
Incremental search for string, forward Ctrl-S + string -
Incremental search for string, backward Ctrl-R + string -
(Incremental) search for next occurrence, forward Ctrl-S again -
(Incremental) search for next occurrence, backward Ctrl-R again -
Being able to do incremental search like with GNU Emacs gave me yet another reason for continuing to use Screen and not to switch Tmux. ;-)

2 October 2013

Axel Beckert: How to make wget honour Content-Disposition headers

Download links often point to CGI scripts which actually generate (or just fetch, i.e. proxy) the actual file to be downloaded, e.g. URLs like http://www.example.com/download.cgi?file=foobar.txt. Most of such CGI scripts send the real file name in the Content-Disposition header as specified in the MIME Specification. All browsers I know (well, at least those I use regularily :-) handle that perfectly and propose the file name sent in the Content-Disposition header as file name for saving the downloaded name which is usually exactly what I want. All browsers do that, , just not my favourite commandline download tool GNU Wget Downloading the above URL with wget would look like this with default settings:
$ wget 'http://www.example.com/download.cgi?file=foobar.txt'
--2013-10-02 16:04:16--  http://www.example.com/download.cgi?file=foobar
Resolving www.example.com (www.example.com)... 93.184.216.119, 2606:2800:220:6d:26bf:1447:1097:aa7
Connecting to www.switch.ch (www.example.com) 2606:2800:220:6d:26bf:1447:1097:aa7 :80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2020 (2.0K) [text/plain]
Saving to:  download.cgi?file=foobar.txt'
100%[============================================>] 2,020       --.-K/s   in 0s
2013-10-02 16:04:24 (12.5 MB/s) -  download.cgi?file=foobar.txt' saved [2020/2020]
Meh! But luckily Wget can do that, it s just not enabled by default because it s an experimental and possibly buggy feature, at least according to the man page. Well, works for me! :-) You can easily enabled it by default for either your user or the whole system by placing the following line in your ~/.wgetrc or /etc/wgetrc:
content-disposition = on
Given the CGI script sends an appropriate Content-Disposition header, the above output now looks like this:
$ wget 'http://www.example.com/download.cgi?file=foobar.txt'
--2013-10-02 16:04:16--  http://www.example.com/download.cgi?file=foobar
Resolving www.example.com (www.example.com)... 93.184.216.119, 2606:2800:220:6d:26bf:1447:1097:aa7
Connecting to www.switch.ch (www.example.com) 2606:2800:220:6d:26bf:1447:1097:aa7 :80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2020 (2.0K) [text/plain]
Saving to:  foobar.txt'
100%[============================================>] 2,020       --.-K/s   in 0s
2013-10-02 16:04:24 (12.5 MB/s) -  foobar.txt' saved [2020/2020]
Now Wget does what I mean! You can also set this as flag on the commandline, but typing wget --content-disposition everytime is surely not what I want. ;-)

2 May 2013

Axel Beckert: New web browsers in Wheezy

Since there is so much nice new stuff in Debian Wheezy, I have to split up my contributions to Mika s #newinwheezy game on Planet Debian. Here s the next bunch, this time web browsers:
Dillo Screenshot
dillo
The FLTK-based lightweight GUI web browser Dillo comes with its own rendering engine (no JavaScript, incomplete CSS support) was already in Debian before, but was removed before the release of Debian Squeeze, because Dillo 2 relied on FLTK 2.x which had an unclear license situation back then and never made it into Debian. In the meanwhile Dillo 3 relies on FLTK 1.3 as FLTK upstream abandoned the 2.0 branch and continued development on the 1.3 branch. So I brought Dillo back into Debian with its 3.0.x release.

Netsurf Screenshot
netsurf
The RiscOS-originating lightweight GUI web browser Netsurf was already in Debian, too, but didn t make it into Debian Squeeze as it needed the Lemon parser generator (part of the SQLite source) to build back then and a change in Lemon caused Netsurf to no more build properly in the wrong moment. Netsurf supports CSS 2.1, but has no JavaScript support either. I d consider its rendering engine more complete than Dillo s.

XXXTerm Screenshot
surf and xxxterm
Surf and XXXTerm are both simple and minimalistic webkit-based browsers. Surf is easy to embed in other applications and XXXTerm features vi-like keybindings for heavy keyboard users.
To be continued ;-)

Axel Beckert: New SSH-related stuff in Wheezy

Mika had the nice idea of doing a #newinwheezy game on Planet Debian, so let s join: There are (at least) two new SSH related tools new in Debian Wheezy:
mosh
is the mobile shell , an UDP based remote shell terminal which works better than SSH in case of lag, packet loss or other forms of bad connection. I wrote about mosh in more detail about a year ago. mosh is also available for Debian Squeeze via squeeze-backports.
sshuttle
is somewhere between port-forwarding and VPN. It allows forward arbitrary TCP connections over an SSH connection without the need to configure individual port forwardings. It does not need root access on the server-side either. I wrote about sshuttle in more detail about a year ago.
To be continued ;-)

20 March 2013

Jan Wagner: Chemnitzer Linux-Tage 2013!

Also this year the Debian project was present at Chemnitzer Linuxtage, this time right next the debianforum.de booth. The booth folks arriving on friday organized a flashmob at Expitas after booth setup. Unfortunatly our second planned flashmob at the mensa was boycotted by much more students, so we ended up in the Turm-Brauhaus, which is a great location with good drinks but the service was very harshly. On the next two days at the booth we chatted and discussed with visitors and other exhibitors a wide variety of questions, including 'When will be (the next Debian version) released?' and 'Are there installation disks available?'. The answers was as always 'When we are ready and we will have reached the quality-level we defined', 'No we don't have installation medias, as they are always outdated. Do you have an USB-dongle with you?'. Merchandising was requested by visitors as always, but we just had some leftovers of fosdem, brought by Axel. The demonstration was as usual a small box running Babelbox and xpenguins which worked out the last years too. This year there were three lectures held by Debian related people, about Debian GIS, Aptitude - known but even unknown and SSH and unreliable network connections. The organisation team did a really great job. The social event at saturday night was very exciting and we left it early in the morning. The whole event was indeed fun and a pleasure to find new friends and meet old ones of the Free Software community. Many thanks to Florian Baumann, Jan Dittberner, Andreas Tille, Christian Hoffmann, Axel Beckert, Markus Rekkenbeil, Daniel Schier, Jonas Genannt, Jan H rsch and kurio for taking care and running the booth, which worked out this year extreme smoothly from my point of view. Likewise as the last years a special thanks to TMT GmbH & Co. KG, which kindly donated additional boothtickets, the equipment, its transportation and accommodation for almost half of the booth staff.

10 March 2013

Axel Beckert: Rendering Markdown, Asciidoc and Friends automatically while Editing

Partially because of Markdown being Github s markup format of choice, I enjoy writing documents in simple markup formats more and more. There s though one common annoyance with these formats compared to writing plain HTML The Annoyance They need to be rendered (i.e. more or less compiled) before you can view your outpourings rendered, e.g. in the web browser. So the workflow usually is:
  1. Saving the current file in your favourite editor
  2. Switch to terminal with commandline
  3. Cursor up, Enter
  4. Switch to your favourite web browser
  5. Hit the reload button
Using a Specialized Editor with Live Preview One choice would be to use a specific editor with live rendering. The one I know in Debian (from Wheezy on) is ReText (Debian package retext). It supports Markdown and reStructuredText. But as with most simple GUI editors, I miss there many of the advanced editing commands possible with Emacs. Using Emacs Markdown Mode Then there is the Markdown Mode for Emacs (part of Debian s emacs-goodies-el package), where you can get a preview by pressing C-c C-c p. But for some reason this takes several seconds, opens a new buffer and window with the rendered HTML code and then starts (hardcoded) Firefox (which is not my preferred web browser). And if you do that a second time without closing Firefox first, it won t just reload the file but will open a new tab. You might think that just hitting reload should suffice. But no, the new tab has a different file name, so reload doesn t help. Additionally it may not use my preferred Markdown implementation. Meh. Well, I probably could fix all those issues with Markdown Mode, it s only Emacs Lisp. Heck, the called command is even configurable. But fixing at least four issues to fix one workflow annoyance? Maybe some other time, but not as long there are other nice choices Using inotifywait to Render on Write So everytime you save the currently edited file, you immediately want to rerender the same HTML file from it. This can be easily automated by using Linux inotify kernel subsystem which notices changes to the filesystem, and reports those to applications which ask for it. One such tool is inotifywait which can either output all or just specific events, or just exit if the first requested event occurs. With the latter it s easy to write a while loop on the commandline which regenerates a file after every write access. I use either Pandoc or Asciidoc for that since both generate full HTML pages including header and footer, but you can use that also with Markdown to render just the HTML body. Most browsers render it correctly anyway:
while inotifywait -q -e modify index.md; do pandoc -s -f markdown -t html -o index.html index.md; done
while inotifywait -q -e modify index.txt; do asciidoc index.txt; done
while inotifywait -q -e modify index.md; do markdown index.md > index.html; done
This solution is even editor- and build-system-agnostic (But not operating-system-agnostic.) inotifywait is part of inotify-tools, a useful set of commandline tools to interface with inotify. They re packaged in Debian as inotify-tools, too. Using mdpress for Markdown plus Impress.js based Slides The ruby-written mdpress is a special case of the previous case. It s a commandline tool to convert Markdown into Impress.js based slide shows and it has an option named --automatic which causes it to keep running and automatically update the presentation as soon as changes are made to the Markdown file. mdpress is not yet in Debian, but there s an ITP for it and Impress.js itself recently entered Debian as libjs-impress. Nevertheless, two dependencies (highlight.js, ITP ed, ruby-launchy, ITP ed) are still missing in Debian.

Axel Beckert: Up to date Aptitude Documentation Online

Aptitude ships documentation in 7 languages as HTML files. However the latest version available online was 0.4.11.2 from 2008 and hosted on the server by the previous, now unfortunately inactive Aptitude maintainer, and only covered 5 languages. This lack of up to date online documentation even caused others to put more up to date versions online. Nevertheless they age, too, and the one I m aware is not up to date for Wheezy. So the idea was born to keep an up to date version online on Aptitude s Alioth webspace (which currently redirects to a subdirectory of the previous maintainer s personal website). But unfortunately we, the current Aptitude Team, are still lacking administrative rights on Aptitude s Alioth project, which would be necessary to assign new team members who could work on that. As an intermediate step, there s now a (currently ;-) up to date Aptitude User s Manual online in all 7 languages at

http://people.debian.org/~abe/aptitude/ and English at

http://people.debian.org/~abe/aptitude/en/ As this location could also suffer from the same MIA issues as any other personal copy, the plan is to move this to somewhere under http://aptitude.alioth.debian.org/ as soon as we have full access to Aptitude s Alioth project. Our plans for then are:

P.S.: Anyone interested in doing a German translation of the Aptitude User s Manual? Sources are in DocBook, i.e. XML, and available via Git.

21 November 2012

Axel Beckert: Suggestions for the GNOME Team

Thanks to Erich Schubert s blog posting on Planet Debian I became aware of the 2012 GNOME User Survey at Phoronix. Like back in 2006 I still use some GNOME applications, so I do consider myself as GNOME user in the widest sense and hence I filled out that survey. Additionally I have to live with GNOME 3 as a system administrator of workstations, and that s some kind of usage, too. ;-) The last question in the survey was Do you have any comments or suggestions for the GNOME team? Sure I have. And since I tried to give constructive feedback instead of only ranting, here s my answer to that question as I submitted it in the survey, too, just spiced up with some hyperlinks and highlighting:
Don t try to change the users. Give the users more possibilities to change GNOME if they don t agree with your own preferences and decisions. (The trend to castrate the user was already starting with GNOME 2 and GNOME 3 made that worse IMHO.) If you really think that you need less configurability because some non-power-users are confused or challenged by too many choices, then please give the other users at least the chance to enable more configuration options. A very good example in that hindsight was Kazehakase (RIP) who offered several user interfaces (novice, intermediate and power user or such). The popular text-mode web browser Lynx does the same, too, btw. GNOME lost me mostly with the change to GNOME 2. The switch from Galeon 1.2 to 1.3/2.0 was horrible and the later switch to Epiphany made things even worse on the browser side. My short trip to GNOME as desktop environment ended with moving back to FVWM (configurable without tons of clicking, especially after moving to some other computer) and for the browser I moved on to Kazehakase back then. Nowadays I m living very well with Awesome and Ratpoison as window managers, Conkeror as web browser (which are all very configurable) and a few selected GNOME applications like Liferea (luckily still quite configurable despite I miss Gecko s about:config since the switch to WebKit), GUCharmap and Gnumeric. For people switching from Windows I nowadays recommend XFCE or maybe LXDE on low-end computers. I likely would recommend GNOME 2, too, if it still would exist. With regards to MATE I m skeptical about its persistance and future, but I m glad it exists as it solves a lot of problems and brings in just a few new ones. Cinnamon as well as SolusOS are based on the current GNOME libraries and are very likely the more persistent projects, but also very likely have the very same multi-head issues we re all barfing about at work with Ubuntu Precise. (Heck, am I glad that I use Awesome at work, too, and all four screens work perfectly as they did with FVWM before.)
Thanks to Dirk Deimeke for his (German written) pointer to Marcus Moeller s interview with Ikey Doherty (in German, too) about his Debian-/GNOME-based distribution SolusOS.

Axel Beckert: zutils: zcat and friends on Steroids

I recently wrote about tools to handle archives conveniently. If you just have to handle compressed text files, there are some widely known shortcut commands to mimic common commands on files compressed with a specific compression format.
gzip bzip2 lzma xz
cat zcat bzcat lzcat xzcat
cmp zcmp bzcmp lzcmp xzcmp
diff zdiff bzdiff lzdiff xzdiff
grep zgrep bzgrep lzgrep xzgrep
egrep zegrep bzegrep lzegrep xzegrep
fgrep zfgrep bzfgrep lzfgrep xzfgrep
more zmore bzmore lzmore xzmore
less zless bzless lzless xzless
In Debian and derivatives, those tools are part of the according package for that compression utility, i.e. the zcat command is part of the gzip package and the xzfgrep command is part of the xz-utils package. But despite this matrix is quite easy to remember, the situation has a few drawbacks: This is where the zutils project comes in: zutils provides the functionality of most of these utilities, too, but with one big difference: You don t have to remember, think about or type which compression method has been used for your data, just use zcat, zcmp, zdiff, zgrep, zegrep, or zfgrep and it works independently of what compression method has been used if any or if there are different compression types mixed in the parameters to the same command:
$ zfgrep foobar bla.txt fnord.gz hurz.xz quux.lz bar.lzma
Especially if you use logrotate and let logrotate compress old logs, it s very comfortable that one command suffices to concatenate all the available logfiles, including the current uncompressed one:
$ zcat /var/log/syslog*    
Additionally, zutils versions of these tools also support lzip-compressed files. The zutils package is available in Debian starting with Wheezy and in Ubuntu since Oneiric. When being installed, it replaces the original z* utilities from the gzip package by diverting them away. The only drawback so far is that there neither a zless nor a zmore utility from the zutils project, so zless bla.txt fnord.gz hurz.xz quux.lz bar.lzma will not work as expected even after installing zutils as it is still the one from the gzip package and hence it will show you just the first two files in plain text, but not the remaining ones.

16 November 2012

Axel Beckert: Useful but Unknown Unix Tools: dwdiff better than wdiff + colordiff

A year ago I wrote in Useful but Unknown Unix Tools: How wdiff and colordiff help to choose the right Swiss Army Knife about using wdiff and colordiff together. Colordiff ed wdiff output looks like this:
$ wdiff foobar.txt barfoo.txt   colordiff
[-foo-]bar fnord
gnarz hurz quux
bla  +foo+  fasel
But if you have colour, why still having these hard to read wdiff markers still in the text? There exists a tool named dwdiff which can do word diffs in colour without textual markers and with even less to type (and without being git diff --color-words ;-). Actually it looks like git diff --color-words, just without the git:
$ dwdiff -c foobar.txt barfoo.txt
foo bar fnord
gnarz hurz quux
bla foo fasel
Another cool thing about dwdiff (and its name giving feature) is that you can defined what you consider whitespace, i.e. which character(s) delimit the words. So lets do the example above again, but this time declare that f is considered the only whitespace character:
$ dwdiff -W f -c foobar.txt barfoo.txt
foo bar bar fnord
gnarz hurz quux
bla foo fasel
dwdiff can also show line numbers:
$ dwdiff -c -L foobar.txt barfoo.txt
   1:1    foo bar fnord
   2:2    gnarz hurz quux
   3:3    bla foo fasel
$ dwdiff -c -L foobar.txt quux.txt
   1:1    foo bar fnord
   1:2    foobar floedeldoe
   2:3    gnarz hurz quux
   3:4    bla foo fasel
(coloured shell screenshots by aha)

15 November 2012

Axel Beckert: Tools to handle archives conveniently

TL;DR: There s a summary at the end of the article. Today I wanted to see why a dependency in a .deb-package from an external APT repository changed so that it became uninstallable. While dpkg-deb --info foobar.deb easily shows the control information, the changelog is in the filesystem part of the package. I could extract that one dpkg-deb, too, but I d have to extract either to some temporary directory or pipe it into tar which then can extract a single file from the archive and sent it to STDOUT:
dpkg-deb --fsys-tarfile foobar.deb   tar xOf - ./usr/share/doc/foobar/changelog.Debian.gz   zless
But that s tedious to type. The following command is clearly less to type and way easier to remember:
acat foobar.deb ./usr/share/doc/foobar/changelog.Debian.gz   zless
acat stands for archive cat is part of the atool suite of commands:
als
lists files in an archive.
$ als foobar.tgz
drwxr-xr-x abe/abe           0 2012-11-15 00:19 foobar/
-rw-r--r-- abe/abe          13 2012-11-15 00:20 foobar/bar
-rw-r--r-- abe/abe          13 2012-11-15 00:20 foobar/foo
acat
extracts files in an archive to standard out.
$ acat foobar.tgz foobar/foo foobar/bar
foobar/bar
bar contents
foobar/foo
foo contents
adiff
generates a diff between two archives using diff(1).
$ als quux.zip
Archive:  quux.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2012-11-15 00:23   quux/
       16  2012-11-15 00:22   quux/foo
       13  2012-11-15 00:20   quux/bar
---------                     -------
       29                     3 files
$ adiff foobar.tgz quux.zip
diff -ru Unpack-3594/foobar/foo Unpack-7862/quux/foo
--- Unpack-3594/foobar/foo      2012-11-15 00:20:46.000000000 +0100
+++ Unpack-7862/quux/foo        2012-11-15 00:22:56.000000000 +0100
@@ -1 +1 @@
-foo contents
+foobar contents
arepack
repacks archives to a different format. It does this by first extracting all files of the old archive into a temporary directory, then packing all files extracted to that directory to the new archive. Use the --each (-e) option in combination with --format (-F) to repack multiple archives using a single invocation of atool. Note that arepack will not remove the old archive.
$ arepack foobar.tgz foobar.txz
foobar.tgz: extracted to  Unpack-7121/foobar'
foobar.txz: grew 36 bytes
apack
creates archives (or compresses files). If no file arguments are specified, filenames to add are read from standard in.
aunpack
extracts files from an archive. Often one wants to extract all files in an archive to a single subdirectory. However, some archives contain multiple files in their root directories. The aunpack program overcomes this problem by first extracting files to a unique (temporary) directory, and then moving its contents back if possible. This also prevents local files from being overwritten by mistake.
(atool subcommand descriptions from the atool man page which is licensed under GPLv3+. Examples by me.) I though miss the existence of an agrep subcommand. Guess why? atool supports a wealth of archive types: tar (gzip-, bzip-, bzip2-, compress-/Z-, lzip-, lzop-, xz-, and 7zip-compressed), zip, jar/war, rar, lha/lzh, 7zip, alzip/alz, ace, ar, arj, arc, rpm, deb, cab, gzip, bzip, bzip2, compress/Z, lzip, lzop, xz, rzip, lrzip and cpio. (Not all subcommands support all archive types.) Similar Utilities There are some utilities which cover parts of what atool does, too: Tools from the mtools package Yes, they come from the handle MS-DOS floppy disks tool package, don t ask me why. :-)
uz
gunzips and extracts a gzip d tar d archives
Advantage over aunpack: Less to type. :-)
Disadvantage compared to aunpack: Supports only one archive format.
lz
gunzips and shows a listing of a gzip d tar d archive
Advantage over als: One character less to type. :-)
Disadvantage compared to als: Supports only one archive format.
unp unp extracts one or more files given as arguments on the command line.
$ unp -s
Known archive formats and tools:
7z:           p7zip or p7zip-full
ace:          unace
ar,deb:       binutils
arj:          arj
bz2:          bzip2
cab:          cabextract
chm:          libchm-bin or archmage
cpio,afio:    cpio or afio
dat:          tnef
dms:          xdms
exe:          maybe orange or unzip or unrar or unarj or lha 
gz:           gzip
hqx:          macutils
lha,lzh:      lha
lz:           lzip
lzma:         xz-utils or lzma
lzo:          lzop
lzx:          unlzx
mbox:         formail and mpack
pmd:          ppmd
rar:          rar or unrar or unrar-free
rpm:          rpm2cpio and cpio
sea,sea.bin:  macutils
shar:         sharutils
tar:          tar
tar.bz2,tbz2: tar with bzip2
tar.lzip:     tar with lzip
tar.lzop,tzo: tar with lzop
tar.xz,txz:   tar with xz-utils
tar.z:        tar with compress
tgz,tar.gz:   tar with gzip
uu:           sharutils
xz:           xz-utils
zip,cbz,cbr,jar,war,ear,xpi,adf: unzip
zoo:          zoo
So it s very similar to aunpack, just a shorter command and it supports some more exotic archive formats which atool doesn t support. Also part of the unp package is ucat which does more or less the same as acat, just with unp as backend. dtrx From the man page of dtrx:
In addition to providing one command to extract many different archive types, dtrx also aids the user by extracting contents consistently. By default, everything will be written to a dedicated directory that s named after the archive. dtrx will also change the permissions to ensure that the owner can read and write all those files. Supported archive formats: tar, zip (including self-extracting .exe files), cpio, rpm, deb, gem, 7z, cab, rar, and InstallShield. It can also decompress files compressed with gzip, bzip2, lzma, or compress.
dtrx -l lists the contents of an archive, i.e. works like als or lz. dtrx has two features not present in the other tools mentioned so far: Unfortunately you can t mix those two features. But you can use the following tool for that purpose: deepfind deepfind is a command from the package strigi-utils and recursively lists files in archives, including archives in archives. I ve already written a detailed blog-posting about deepfind and its friend deepgrep. tardiff tardiff was written to check what changed in source code tarballs from one release to another. By default it just lists the differences in the file lists, not in the files contents and hence works different than adiff. Summary atool and friends are probably the first choice when it comes to DWIM archive handling, also because they have an easy to remember subcommand scheme. uz and lz and the shortest way to extract or list the contents of a .tar.gz file. But nothing more. And you have to install mtools even if you don t have a floppy drive. unp comes in handy for exotic archive formats atool doesn t support. And it s way easier to remember and type than aunpack. dtrx is neat if you want to extract archives in archives or if you want to extract metadata from some package files with just a few keystrokes. For listing all files in recursive archives, use deepfind.

30 August 2012

Axel Beckert: deepgrep: grep nested archives with one command

Several months ago, I wrote about grep everything and listed grep-like tools which can grep through compressed files or specific data formats. The blog posting sparked several magazine articles and talks by Frank Hofmann and me. Frank recently noticed that we though missed one more or less mighty tool so far. We missed it, because it s mostly unknown, undocumented and hidden behind a package name which doesn t suggest a real recursive grep everything : deepgrep deepgrep is part of the Debian package strigi-utils, a package which contains utilities related to the KDE desktop search Strigi. deepgrep especially eases the searching through tar balls, even nested ones, but can also search through zip files and OpenOffice.org/LibreOffice documents (which are actually zip files). deepgrep seems to support at least the following archive and compression formats: A search in an archive which is deeply nested looks like this:
$ deepgrep bar foo.ar
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:foobar
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:bar
deepgrep though neither seems to support any LZMA based compression (lzma, xz, lzip, 7z), nor does it support lzop, rzip, compress (.Z suffix), cab, cpio, xar, or rar. Further current drawbacks of deepgrep: deepfind If you just need the file names of the files in nested archives, the package also contains the tool deepfind which does nothing else than to list all files and directories in a given set of archives or directories:
$ deepfind foo.ar
foo.ar
foo.ar/foo.tar
foo.ar/foo.tar/foo.tar.gz
foo.ar/foo.tar/foo.tar.gz/foo.zip
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt
As with deepgrep, deepfind does not implement any common options of it s normal sister tool find. Dependencies The package strigi-utils doesn t pull in the complete Strigi framework (i.e. no daemon), just a few libraries (libstreams, libstreamanalyzer, and libclucene). On Wheezy it also pulls in some audio/video decoding libraries which may make some server administrators less happy. Conclusion Both tools are quite limited to some basic use cases, but can be worth a fortune if you have to work with nested archives. Nevertheless the claim in the Debian package description of strigi-utils that they re enhanced versions of their well known counterparts is IMHO disproportionate. Most of the missing features and documentation can be explained by the primary purpose of these tools: Being backend for desktop searches. I guess, there wasn t much need for proper commandline usage yet. Until now. ;-) 42.zip And yes, I was curious enough to let deepfind have a look at 42.zip (the one from SecurityFocus, unzip seems not able to unpack 42.zip from unforgettable.dk due a missing version compatibility) and since it just traverses the archive sequentially, it has no problem with that, needing just about 5 MB of RAM and a lot of time:
[ ]
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip/0.dll
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip/0.dll
deepfind 42.zip  11644.12s user 303.89s system 97% cpu 3:24:02.46 total
I though won t try deepgrep on 42.zip. ;-)

15 July 2012

Debian Med: Debian Med Bits: Report from LSM Geneva by Andreas Tille

In this report from LSM 2012 in Geneva I will report about
  1. Medical imaging using Debian
  2. Debian Med packaging workshop
  3. Integration of VistA into Debian
  4. Other interesting talks
Medical imaging using Debian There were about 10 attendees basically upstream developers of medical imaging software. The talk got some attention and the message to include even more medical imaging software into Debian was well percived. Thanks to Mathieu Malaterre there was some live demonstration which was way easier for him as a medical imaging expert than it would have been for me.
Debian Med packaging workshop Due to my advertising in the talk yesterday three students (two of them from one medical imaging project, one from an other project) attended the workshop. Thanks to Axel Beckert who helped me out surviving the challenge to walk on unexplored ground.

The idea of the workshop was to ask the attendees to name a package of their own and just package this. Because two of the attendees were upstream developers of CreaTools we decided to go on for packaging this. After circumeventing some pitfalls in the beginning it went rather smoothly and after about 2.5 hours we were able to commit some initial packaging to the Debian Med Git repository which comes quite close to a ready package (perhaps some split into a library and a development package needs to be done and for sure testing is needed).
Quoting Frederic Cervenansky, upstream of CreaTools
Thanks for your work. Your workshop was very interesting and didactic: a relevant discussion between Claire and me for the future of Creatools has emerged from the difficulties you encountered to package creatools. I will try, before the end of the month, to fully package creatools. And for sure, I will contact the debian-med mailing list.
Integration of VistA into Debian I had the good chance to directly address some issues of Claudio Zaugg the speaker in the talk Implementing open source Health Information Systems in Low- and Middle Income Countries a practical review directly before mine. It turned out that by using Debian packaged software might help simplifying the issues they had in supporting health care workers in Low- and Middle Income Countries.
My talk was partly repeating some basic ideas about Debian Med from the talk on Monday because the audience was completely different. Than I tried to explain in detail how we tried hard to establish good contacts to upstream developers and why this is essential to finalise the goal to include hospital information systems straight into Debian any by doing so open the doors of hospitals for large scale Debian installations.
There is also video recording of this talk. Other interesting talks OpenEMR, a multi-language free open source electronic health record for international use Just discussed the packaging of OpenEMR which is prepared for Debian Med as it can be seen on our tasks page. The contact to the creator of some inofficial package will be established to finalise this task.
OpenFovea : when open-source and biophysical research get married Just another target for Debian Med popped up in this talk to further enhance Debian Med in covering all issues of medical care on one hand and on the other hand helping upstream authors to distribute their code
more effectively.
Collaborative software development for nanoscale physics The talk would have fit very nicely into the Debian Science workshop at ESRF (European Synchrotron Radiation Facility) in Grenoble because it was about ETRF (European Theoretical Radiation Facility). At previous LSM events I had just talked with Yann and the work to include their software into Debian is on its way.
Free software and High Performance Computing This talk was not directly connected to my Debian work but I simply enjoyed to see how "two people" had a really entertaining talk about Top 500 computers. Vittoria, you made my last day at LSM.

5 June 2012

Axel Beckert: Finding similar but not identical files

There are quite some tools to find duplicate files in Debian and depending on the task I use either hardlink (see this blog posting), fdupes (if I need output with all identical files on one line; see example below), or duff (if it has to be performant). But for code deduplication in historically grown code you sometimes need a tool which does not only find identical files, but also those which just differ in a few blanks or blank lines. I found two tools in Debian which can give you some kind of percentage of similarity: simhash (which is btw. orphaned; upstream homepage) and similarity-tester (upstream homepage). simhash has the shorter name and hecne sounds more usable on the command-line. But it seems only be able to compare two files at once and also only after first computing and writing down its similarity hash to a file. Not really usable for those one-liner cases on the command-line. similarity-tester has the longer name (and one which made me suspect that it may be a GUI tool), but provides what I was looking for:
$ find . -type f   sim_text -ipTt 75
This lists all files in the current directory which have at 75% ( -t 75 ) in common with another file in the list of files. The option -i causes sim_text to read the files to compare from standard input; -p causes sim_text to just output the similarity percentage; and -T suppresses the per-file list of found tokens. I used similarity-tester s sim_text tool to compare natural langauge as most of the files, I had to test, are shell scripts. But similarity-tester also provides tools to test the similarity of code in specific programming languages, namely C, Java, Pascal, Modula-2, Lisp and Miranda. Example output from the xen-tools project (after I already did a lot of code deduplication):
./intrepid/30-disable-gettys consists for 100 % of ./edgy/30-disable-gettys material
./edgy/30-disable-gettys consists for 100 % of ./intrepid/30-disable-gettys material
./common/90-make-fstab-rpm consists for 98 % of ./centos-5/90-make-fstab material
./centos-5/90-make-fstab consists for 98 % of ./common/90-make-fstab-rpm material
./gentoo/55-create-dev consists for 91 % of ./dapper/55-create-dev material
./dapper/55-create-dev consists for 90 % of ./gentoo/55-create-dev material
./gentoo/55-create-dev consists for 88 % of ./common/55-create-dev material
./common/90-make-fstab-deb consists for 87 % of ./common/90-make-fstab-rpm material
./common/90-make-fstab-rpm consists for 85 % of ./common/90-make-fstab-deb material
./common/30-disable-gettys consists for 81 % of ./karmic/30-disable-gettys material
./intrepid/80-install-kernel consists for 78 % of ./edgy/80-install-kernel material
./edgy/30-disable-gettys consists for 76 % of ./karmic/30-disable-gettys material
./karmic/30-disable-gettys consists for 76 % of ./edgy/30-disable-gettys material
./common/50-setup-hostname-rpm consists for 76 % of ./gentoo/50-setup-hostname material
Depending on the length of possible filenames and amount of files this can be made more readable using the column utility from the bsdmainutils package and reversed by using tac from the coreutils package:
$ find . -type f   sim_text -ipTt 75   tac   column -t
./common/50-setup-hostname-rpm  consists  for  76   %  of  ./gentoo/50-setup-hostname    material
./karmic/30-disable-gettys      consists  for  76   %  of  ./edgy/30-disable-gettys      material
./edgy/30-disable-gettys        consists  for  76   %  of  ./karmic/30-disable-gettys    material
./intrepid/80-install-kernel    consists  for  78   %  of  ./edgy/80-install-kernel      material
./common/30-disable-gettys      consists  for  81   %  of  ./karmic/30-disable-gettys    material
./common/90-make-fstab-rpm      consists  for  85   %  of  ./common/90-make-fstab-deb    material
./common/90-make-fstab-deb      consists  for  87   %  of  ./common/90-make-fstab-rpm    material
./gentoo/55-create-dev          consists  for  88   %  of  ./common/55-create-dev        material
./dapper/55-create-dev          consists  for  90   %  of  ./gentoo/55-create-dev        material
./gentoo/55-create-dev          consists  for  91   %  of  ./dapper/55-create-dev        material
./centos-5/90-make-fstab        consists  for  98   %  of  ./common/90-make-fstab-rpm    material
./common/90-make-fstab-rpm      consists  for  98   %  of  ./centos-5/90-make-fstab      material
./edgy/30-disable-gettys        consists  for  100  %  of  ./intrepid/30-disable-gettys  material
./intrepid/30-disable-gettys    consists  for  100  %  of  ./edgy/30-disable-gettys      material
Compared to that, fdupes only finds the two 100% identical files:
$ fdupes -r1 . 
./intrepid/30-disable-gettys ./edgy/30-disable-gettys 
But fdupes helped me already a lot to find the first bunch of identical files in xen-tools. :-)

5 May 2012

Axel Beckert: unburden-home-dir uploaded to Sid

Most popular web browsers cause quite a lot of I/O on a user s home directory and their cache s also take up quite some disk space with Google s Chrome/Chromium you can t even configure how much disk space should be used for the cache. This causes unnecessary network traffic and no more makes sense if the home directory itself comes over the network, e.g. via NFS or Samba. And on laptops it spins up the disks and unnecessarily costs battery power and therefore lowers the potential battery life. Such caches also costs scarce disk space on SSDs or flash cards as common in laptops, netbooks and other mobile devices, and often get backed up without any real use. To take some of this burden off our NFS servers at work I started to develop an Xsession.d hook which moves off such caches to the local disk and puts in symbolic links instead into the user s home directory when the user locally logs in. This hook quickly became a standalone Perl script named unburden-home-dir and the Xsession.d hook just a wrapper around it. Due to some unsolved issues I didn t feel it s good enough for Debian Unstable, so I uploaded it just to Debian Experimental back then. Pietro Abate s recent blog posting about unburden-home-dir on Planet Debian gave me the right kick to make another try to solve the remaining issues. And the mental distance gained over the time indeed helped and I could fix the remaining issues. So I added some polish to the package and uploaded it to Debian Unstable. If you used the previous version from experimental, you have to take care of a few things: You can follow the development of unburden-home-dir also on GitHub and on Gitorious as well as on Ohloh. Enjoy!

14 April 2012

Axel Beckert: Automatically hardlinking duplicate files under /usr/share/doc with APT

On my everyday netbook (a very reliable first generation ASUS EeePC 701 4G) the disk (4 GB as the product name suggests :-) is nearly always close to full. TL;DWTR? Jump directly to the HowTo. :-) So I came up with a few techniques to save some more disk space. Installing localepurge was one of the earliest. Another one was to implement aptitude filters to do interactively what deborphan does non-interactively. Yet another one is to use du and friends a lot ncdu is definitely my favourite du-like tool in the meanwhile. Using du and friends I often noticed how much disk space /usr/share/doc takes up. But since I value the contents of /usr/share/doc a lot, I condemn how Nokia solved that on the N900: They let APT delete all files and directories under /usr/share/doc (including the copyright files!) via some package named docpurge. I also dislike Ubuntu s solution to truncate the shipped changelog files (you can still get the remainder of the files on the web somewhere) as they re an important source of information for me. So when aptitude showed me that some package suddenly wanted to use up quite some more disk space, I noticed that the new package version included the upstream changelog twice. So I started searching for duplicate files under /usr/share/doc. There are quite some tools to find duplicate files in Debian. hardlink seemed most appropriate for this case. First I just looked for duplicate files per package, which even on that less than four gigabytes installation on my EeePC found nine packages which shipped at least one file twice. As recommended I rather opted for an according Lintian check (see bugs. Niels Thykier kindly implemented such a check in Lintian and its findings are as reported as tags duplicate-changelog-files (Severity: normal, from Lintian 2.5.2 on) and duplicate-files (Severity: minor, experimental, from Lintian 2.5.0 on). Nevertheless, some source packages generate several binary packages and all of them (of course) ship the same, in some cases quite large (Debian) changelog file. So I found myself running hardlink /usr/share/doc now and then to gain some more free disk space. But as I run Sid and package upgrades happen more than daily, I came to the conclusion that I should run this command more or less after each aptitude run, i.e. automatically. Having taken localepurge s APT hook as example, I added the following content as /etc/apt/apt.conf.d/98-hardlink-doc to my system:
// Hardlink identical docs, changelogs, copyrights, examples, etc
DPkg
 
Post-Invoke  "if [ -x /usr/bin/hardlink ]; then /usr/bin/hardlink -t /usr/share/doc; else exit 0; fi"; ;
 ;
So now installing a package which contains duplicate files looks like this:
~ # aptitude install perl-tk
The following NEW packages will be installed:
  perl-tk 
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 2,522 kB of archives. After unpacking 6,783 kB will be used.
Get: 1 http://ftp.ch.debian.org/debian/ sid/main perl-tk i386 1:804.029-1.2 [2,522 kB]
Fetched 2,522 kB in 1s (1,287 kB/s)  
Selecting previously unselected package perl-tk.
(Reading database ... 121849 files and directories currently installed.)
Unpacking perl-tk (from .../perl-tk_1%3a804.029-1.2_i386.deb) ...
Processing triggers for man-db ...
Setting up perl-tk (1:804.029-1.2) ...
Mode:     real
Files:    15423
Linked:   3 files
Compared: 14724 files
Saved:    7.29 KiB
Duration: 4.03 seconds
localepurge: Disk space freed in /usr/share/locale: 0 KiB
localepurge: Disk space freed in /usr/share/man: 0 KiB
localepurge: Disk space freed in /usr/share/gnome/help: 0 KiB
localepurge: Disk space freed in /usr/share/omf: 0 KiB
Total disk space freed by localepurge: 0 KiB
Sure, that wasn t the most space saving example, but on some installations I saved around 100 MB of disk space that way and I still haven t found a case where this caused unwanted damage. (Use of this advice on your own risk, though. Pointers to potential problems welcome. :-)

3 April 2012

Axel Beckert: Tools for CLI Road Warriors: Hidden Terminals

Some networks have no connection to the outside except that they allow surfing through an HTTP(S) proxy. Sometimes you are happy and the HTTPS port (443) is unrestricted. The following server-side tools allow you to exploit these weaknesses and get you a shell on your server. sslh sslh is an SSH/SSL multiplexor. If a client connects to sslh, it checks if the clients speaks the SSH or the SSL protocol and then passes the connection to the according real port of SSL or some SSL enabled service, e.g. an HTTPS, OpenVPN, Tinc or XMPP server. That way it s possible to connect to one of these services and SSH on the same port. The usual scenario where this daemon is useful are firewalls which block SSH, force HTTP to go through a proxy, but allow HTTPS connections without restriction. In that case you let sslh listen on the HTTPS port (443) and to move the real HTTPS server (e.g. Apache) to listen on either a different port number (e.g. 442, 444 or 8443) or on another IP address, e.g. on localhost, port 443. On an Debian or Ubuntu based Apache HTTPS server, you just have to do the following to run Apache on port 442 and sslh on port 443 instead:
  1. apt-get install sslh as root.
  2. Edit /etc/default/sslh, change RUN=no to RUN=yes and --ssl 127.0.0.1:443 to --ssl 127.0.0.1:442.
  3. Edit /etc/apache2/ports.conf and all files in /etc/apache2/sites-available/ which contain a reference to port 443 (which is only /etc/apache2/sites-available/default-ssl.conf in the default configuration) and change all occurrences of 443 to 442.
  4. service apache2 restart
  5. service sslh start
Now you should be able to ssh to your server on port 443 (ssh -p 443 your.server.example.org) while still being able to surf to https://your.server.example.org/. sslh works as threaded or as preforking daemon, or via inetd. It also honors tcpwrapper configurations for sshd in /etc/hosts.allow and /etc/hosts.deny. sslh is available as port or package at least in Gentoo, in FreeBSD, in Debian and in Ubuntu. AjaxTerm A completely different approach takes AjaxTerm. It provides a terminal inside a web browser with login and ssh being its server-side backend. Properly safe-guarded by HTTPS plus maybe HTTP based authentication this can be an interesting emergency alternative to the more common but also more often blocked remote login mechanisms. AjaxTerm is available as package at least in Debian and in Ubuntu.

Happily I never were forced to use either of them myself. :-)

22 March 2012

Axel Beckert: Tools for CLI Road Warriors: Tunnels

Sometime the network you re connected to is either untrusted (e.g. wireless) or castrated in some way. In both cases you want a tunnel to your trusted home base. Following I ll show you three completely different tunneling tools which may helpful while travelling. sshuttle sshuttle is a tool somewhere in between of automatic port forward and VPN. It tunnels arbitrary TCP connections and DNS through an SSH tunnel without requiring root access on the remote end of the SSH connection. So it s perfect for redirecting most of your traffic through an SSH tunnel to your favourite SSH server, e.g. to ensure your local privacy when you are online via a public, unencrypted WLAN (i.e. easy to sniff for everyone). It runs on Linux and MacOS X and only needs a Python interpreter on the remote side. Requires root access (usually via sudo) on the client side, though. It s currently available at least in Debian Unstable and Testing (Wheezy) as well as in Ubuntu since 11.04 Natty. Miredo Miredo is an free and open-source implementation of Microsoft s NAT-traversing Teredo IPv6 tunneling protocol for at least Linux, FreeBSD, NetBSD and MacOS X. Miredo includes not only a Teredo client but also a Teredo server implementation. The developer of Miredo also runs a public Miredo server, so you don t even need to install a server somewhere. If you run Debian or Ubuntu you just need to do apt-get install miredo as root and you have IPv6 connectivity. It s that easy. So it s perfect to get a dynamic IPv6 tunnel for your laptop or mobile phone independently where you are and without the need to register any IPv6 tunnel or configure the Miredo client. I usually use Miredo on my netbooks to be able to access my boxes at home (which are behind an IPv4 NAT router which is also an SixXS IPv6 tunnel endpoint) from whereever I am. iodine iodine is likely the most undermining tool in this set. It tunnels IPv4 over DNS, allowing you to make arbitrary network connections if you are on a network where nothing but DNS requests is allowed (i.e. only DNS packets reach the internet). This is often the case on wireless LANs with landing page. They redirect all web traffic to the landing page. But the network s routers try to avoid poisoning the client s DNS cache with different DNS replies as they would get after the user is logged in. So DNS packets usually pass even the local network s DNS servers unchanged, just TCP and other UDP packets are redirected until logging in. With an iodine tunnel, it is possible get a network connection to the outside on such a network anyway. On startup iodine tries to automatically find the best parameters (MTU, request type, etc.) for the current environmenent. However that may fail if any DNS server in between imposes DNS request rate limits. To be able to start such a tunnel you need to set up an iodine daemon somewhere on the internet. Choose a server which is not already a DNS server. iodine is available in many distributions, e.g. in Debian and in Ubuntu.

21 March 2012

Axel Beckert: Tools for CLI Road Warriors: Remote Shells

Most of my private online life happens on netbooks and besides the web browser, SSH is my most used program especially on netbooks. Accordingly I also have hosts on the net to which I connect via SSH. My most used program there is GNU Screen. So yes, for things like e-mail, IRC, and Jabber I connect to a running screen session on some host with a permanent internet connection. On those hosts there is usually one GNU Screen instance running permanently with either mutt or irssi (which is also my Jabber client via a Bitlbee gateway). But there are some other less well-known tools which I regard as useful in such a setup. The following two tools can both be seen as SSH for special occassions. autossh I already blogged about autossh, even twice, so I ll just recap the most important features here: autossh is a wrapper around SSH which regularily checks via two tunnels connect to each other on the remote side if the connection is still alive, and if not, it kills the ssh and starts a new one with the same parameters (i.e. tunnels, port forwardings, commands to call, etc.). It s quite obvious that this is perfect to be combined with screen s -R and -d options. I use autossh so often that I even adopted its Debian package. mosh Since last week there s a new kid in town^WDebian Unstable: mosh targets the same problems as autossh (unreliable networks, roaming, suspending the computer, etc.) just with a completely different approach which partially even obsoletes the usage of GNU Screen or tmux: While mosh uses plain SSH for authentication, authorization and key exchange the final connection is an AES-128 encrypted UDP connection on a random port and is independent of the client s IP address. This allows mosh to have the following advantages: The connection stays even if you re switching networks or suspending your netbook. So if you re just running a single text-mode application you don t even need GNU Screen or tmux. (You still do if you want the terminal multiplexing feature of GNU Screen or tmux.) Another nice feature, especially on unreliable WLAN connections or laggy GSM or UMTS connections is mosh s output prediction based on its input (i.e. what is typed). Per line it tries to guess which server reaction a key press would cause and if it detects a lagging connection, it shows the predicted result underlined until it gets the real result from the server. This eases writing mails in a remote mutt or chatting in a remote irssi, especially if you noticed that you made a typo, but can t remember how many backspaces you would have to type to fix it. Mosh needs to be installed on both, client and server, but the server is only activated via SSH, so it has no port open unless a connection is started. And despite that (in Debian) mosh is currently just available in Unstable, the package builds fine on Squeeze, too. There s also an PPA for Ubuntu and of course you can also get the source code, e.g. as git checkout from GitHub. mosh is still under heavy development and new features and bug fixes get added nearly every day. Thanks to Christine Spang for sponsoring and mentoring Keith s mosh package in Debian.

Steve Kemp: My code makes it into GNU Screen, and now you can use it. Possibly.

Via Axel Beckert I learned today that GNU Screen is 25 years old, and although development is slow it has not ceased. Back in 2008 I started to post about some annoyances with GNU Screen. At the time I posted a simple patch to implement the unbindall primitive. I posted some other patches and fixed a couple of bugs, but although there was some positive feedback initially over time that ceased completely. Regretably I didn't have the feeling there was the need to maintain a fork properly, so I quietly sighed, cried, and ceased. In 2009 my code was moved upstream into the GNU Screen repository (+documentation update). We're now in 2012. It looks like there might be a stable release of GNU Screen in the near future, which makes my code live "for real", but in the meantime the recent snapshot upload to Debian Experimental makes it available to the brave. 2008 - 2012. Four years to make my change visible to end-users. If I didn't use screen every day, and still have my own local version, I'd have forgotten about that entirely. Still I guess this makes today a happy day! Wheee! ObQuote: "Thanks. For a while there I thought you were keeping it a secret. " - Escape To Victory

Next.

Previous.